气道分割对于检查,诊断和预后的肺部疾病至关重要,而其手动描述则不当。为了减轻这种耗时且潜在的主观手动程序,研究人员提出了从计算机断层扫描(CT)图像自动分割气道的方法。但是,一些小型气道分支(例如,支气管和终末支气管)显着加剧了通过机器学习模型的自动分割难度。特别是,气道分支中体素值和严重的数据失衡的方差使计算模块容易导致不连续和假阴性预测。注意机制表明了分割复杂结构的能力,而模糊逻辑可以减少特征表示的不确定性。因此,由模糊注意力层给出的深度注意力网络和模糊理论的整合应该是升级的解决方案。本文提出了一种有效的气道分割方法,包括一个新型的模糊注意力神经网络和全面的损失函数,以增强气道分割的空间连续性。深层模糊集由特征图中的一组体素和可学习的高斯成员功能制定。与现有的注意机制不同,所提出的特异性模糊注意力解决了不同渠道中异质特征的问题。此外,提出了一种新的评估指标来评估气道结构的连续性和完整性。该方法的效率已通过在包括精确的09和LIDC数据集在内的开放数据集上进行测试,以及我们的内部Covid-19和纤维化肺病数据集证明了这一建议的效率。
translated by 谷歌翻译
在过去的两年中,Covid-19-19的到来引起的动荡继续带来新的挑战。在这次COVID-19大流行期间,需要快速鉴定感染患者和计算机断层扫描(CT)图像中感染区域的特定描述。尽管已迅速建立了深层监督的学习方法,但图像级和像素级标签的稀缺性以及缺乏可解释的透明度仍然阻碍了AI的适用性。我们可以识别受感染的患者并以极端的监督描绘感染吗?半监督的学习表明,在有限的标记数据和足够的未标记数据下,表现出了有希望的表现。受到半监督学习的启发,我们提出了一种模型不可静止的校准伪标记策略,并将其应用于一致性正则化框架下,以生成可解释的识别和描述结果。我们通过有限的标记数据和足够的未标记数据或弱标记数据的组合证明了模型的有效性。广泛的实验表明,我们的模型可以有效利用有限的标记数据,并为临床常规中的决策提供可解释的分类和分割结果。该代码可从https://github.com/ayanglab/xai covid-11获得。
translated by 谷歌翻译
磁共振成像(MRI)是一种重要的非侵入性临床工具,可以产生高分辨率和可重复的图像。然而,高质量的MR图像需要长时间的扫描时间,这导致患者的疲惫和不适,由于患者的自愿运动和非自愿的生理运动,诱导更多人工制品。为了加速扫描过程,通过K空间欠采样和基于深度学习的重建的方法已经推广。这项工作引进了SwinMR,这是一种基于新型的Swin变压器的快速MRI重建方法。整个网络由输入模块(IM)组成,特征提取模块(FEM)和输出模块(OM)。 IM和OM是2D卷积层,并且FEM由级联的残留的Swin变压器块(RSTBS)和2D卷积层组成。 RSTB由一系列SWIN变压器层(STL)组成。 STL的Shifted Windows多头自我关注(W-MSA / SW-MSA)在移位的窗口中执行,而不是整个图像空间中原始变压器的多头自我关注(MSA)。通过使用灵敏度图提出了一种新的多通道损耗,这被证明是为了保留更多纹理和细节。我们在Calgary-Campinas公共大脑MR DataSet中进行了一系列比较研究和消融研究,并在多模态脑肿瘤细分挑战2017年数据集中进行了下游分段实验。结果表明,与其他基准方法相比,我们的SwinMR实现了高质量的重建,并且它在噪音中断和不同的数据集中显示了不同的遮光罩掩模的稳健性。该代码在https://github.com/ayanglab/swinmr公开使用。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Increasing research interests focus on sequential recommender systems, aiming to model dynamic sequence representation precisely. However, the most commonly used loss function in state-of-the-art sequential recommendation models has essential limitations. To name a few, Bayesian Personalized Ranking (BPR) loss suffers the vanishing gradient problem from numerous negative sampling and predictionbiases; Binary Cross-Entropy (BCE) loss subjects to negative sampling numbers, thereby it is likely to ignore valuable negative examples and reduce the training efficiency; Cross-Entropy (CE) loss only focuses on the last timestamp of the training sequence, which causes low utilization of sequence information and results in inferior user sequence representation. To avoid these limitations, in this paper, we propose to calculate Cumulative Cross-Entropy (CCE) loss over the sequence. CCE is simple and direct, which enjoys the virtues of painless deployment, no negative sampling, and effective and efficient training. We conduct extensive experiments on five benchmark datasets to demonstrate the effectiveness and efficiency of CCE. The results show that employing CCE loss on three state-of-the-art models GRU4Rec, SASRec, and S3-Rec can reach 125.63%, 69.90%, and 33.24% average improvement of full ranking NDCG@5, respectively. Using CCE, the performance curve of the models on the test data increases rapidly with the wall clock time, and is superior to that of other loss functions in almost the whole process of model training.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) via deep learning has attracted appealing attention for tackling domain-shift problems caused by distribution discrepancy across different domains. Existing UDA approaches highly depend on the accessibility of source domain data, which is usually limited in practical scenarios due to privacy protection, data storage and transmission cost, and computation burden. To tackle this issue, many source-free unsupervised domain adaptation (SFUDA) methods have been proposed recently, which perform knowledge transfer from a pre-trained source model to unlabeled target domain with source data inaccessible. A comprehensive review of these works on SFUDA is of great significance. In this paper, we provide a timely and systematic literature review of existing SFUDA approaches from a technical perspective. Specifically, we categorize current SFUDA studies into two groups, i.e., white-box SFUDA and black-box SFUDA, and further divide them into finer subcategories based on different learning strategies they use. We also investigate the challenges of methods in each subcategory, discuss the advantages/disadvantages of white-box and black-box SFUDA methods, conclude the commonly used benchmark datasets, and summarize the popular techniques for improved generalizability of models learned without using source data. We finally discuss several promising future directions in this field.
translated by 谷歌翻译
Most existing text-video retrieval methods focus on cross-modal matching between the visual content of offline videos and textual query sentences. However, in real scenarios, online videos are frequently accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This inspires us to generate associated captions from offline videos to help with existing text-video retrieval methods. To do so, we propose to use the zero-shot video captioner with knowledge of pre-trained web-scale models (e.g., CLIP and GPT-2) to generate captions for offline videos without any training. Given the captions, one question naturally arises: what can auxiliary captions do for text-video retrieval? In this paper, we present a novel framework Cap4Video, which makes use of captions from three aspects: i) Input data: The video and captions can form new video-caption pairs as data augmentation for training. ii) Feature interaction: We perform feature interaction between video and caption to yield enhanced video representations. iii) Output score: The Query-Caption matching branch can be complementary to the original Query-Video matching branch for text-video retrieval. We conduct thorough ablation studies to demonstrate the effectiveness of our method. Without any post-processing, our Cap4Video achieves state-of-the-art performance on MSR-VTT (51.4%), VATEX (66.6%), MSVD (51.8%), and DiDeMo (52.0%).
translated by 谷歌翻译
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of attack incidents and serve as deterrence to potential attacks. We consider the buyers-seller setting where a machine learning model is to be distributed to various buyers and each buyer receives a slightly different copy with same functionality. A malicious buyer generates adversarial examples from a particular copy $\mathcal{M}_i$ and uses them to attack other copies. From these adversarial examples, the investigator wants to identify the source $\mathcal{M}_i$. To address this problem, we propose a two-stage separate-and-trace framework. The model separation stage generates multiple copies of a model for a same classification task. This process injects unique characteristics into each copy so that adversarial examples generated have distinct and traceable features. We give a parallel structure which embeds a ``tracer'' in each copy, and a noise-sensitive training loss to achieve this goal. The tracing stage takes in adversarial examples and a few candidate models, and identifies the likely source. Based on the unique features induced by the noise-sensitive loss function, we could effectively trace the potential adversarial copy by considering the output logits from each tracer. Empirical results show that it is possible to trace the origin of the adversarial example and the mechanism can be applied to a wide range of architectures and datasets.
translated by 谷歌翻译